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Abstract. For the random 2-SAT formula F(n,p), let Fc(n,p) be the formula left after the pure 
literal algorithm applied to F(n,p) stops. Using the recently developed Poisson cloning model 
together with the cut-off line algorithm (COLA), we completely analyze the structure of Fc(n,p). 
In particular, it is shown that, for A := p(2n — 1) = 1 + a with a S> n -1 / 3 , the core of F(n,p) has 
6^n + O((0 A ra) 1//2 ) variables and O^An + O((0 A n)) 1 / 2 clauses, with high probability, where 8 X is the 
larger solution of the equation 6 — (1 — e~ e ^ x ) = 0. We also estimate the probability of F(n,p) 
being satisfiable to obtain 

1 - if A = 1 — <t with a » n- 1 / 3 



Pr[i<2(n, 2n-i ) 1S satisfiable] = < 
where o(l) goes to as a goes to 0. This improves the bounds of Bollobas et al. [8]. 



e -e(<7 3 n) if A = 1 + <7 with a » n" 1 / 3 , 



1 Introduction 

An instance of the satisfiability problem is given by a conjunctive normal form (CNF), that is, a 
conjunction of disjunctions. Each disjunction, or clause, is of the form (yi V • • • V yk), where y^s 
are chosen among 2n literals consisting of n Boolean variables, conditioned that all k literals are 
strictly distinct, i.e., no literals with the same underlying variables appear more than once. The 
problem is whether a given formula has an assignment of truth values (0 or 1) for the n variables 
that satisfies the formula. When such an assignment exists, the formula is called satisfiable. It is 
unsatisfiable, otherwise. It is now well-known that the satisfiability problem is NP-complete (|13j). 
Even the /c-satisfiability problem, in which each clause consists of exactly k literals, is known to 
be NP-complete for k > 3 ([13]). In case of k = 2, there is a polynomial time algorithm [13] to 
determine wether the instance of the 2-satisfiability problem is satisfiable or not. 

The random /c-SAT formula F(n,p; k) on n variables is the conjunction of clauses selected with 
probability p from the set of 2 fc (^) possible clauses, independent of all others. Not surprisingly, 
the random 2-SAT and the random 3-SAT formulae have been most extensively studied and many 
research papers regarding the random models have been published. For k = 2, Chvatal and Reed 
[12j . Goerdt [21] and Fernandez de la Vega [H] independently proved that the random 2-SAT 
problem undergoes a phase transition at 1, that is, 

lim^Pi[F 2 (n, ^tj) is satisfiable] = | q if A > 1 
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Though there is no essential difference, we prefer A = p(2n — 1) to A = 2pn because p(2n — 1) 
is the mean average degree of each literal. Techniques used to prove the phase transition are 
essentially based upon the first and the second moment methods for the number of certain structures 
closely related to the satisfiability. Bollobas et al. [8] took much more sophisticated approaches to 
determine the scaling window for the problem: 

{l-0(-3r-) if A = 1 - a withn a > n" 1 / 3 
e -e(<7 3 n) if A = 1 + a with a > n" 1 ^. 

Though it is believed that the random /c-SAT problem, k > 3, undergoes a similar phase 
transition, it remains as a conjecture. Only sharp transitions are known due to a seminal result 
of Friedgut [17]. The upper and lower bounds for the critical value A3, (assuming the conjecture 
is true for k = 3) have colorful history. In a series of papers [IHl HS1 [231 HH ISSl [22l EH [23 US], 
the upper bound of A3 has been improved to 4.506. There has been considerable work bounding 
A3 from below too. The easiest but fundamental algorithm is the pure literal algorithm (PLA). A 
literal is pure in a formula if it belongs to at least one clause of the formula, while its negation is in 
no clause. The PLA keeps selecting a pure literal, setting it true, and removing clauses containing 
the literal as they are already satisfied. This procedure may (or may not) yield new pure literals. 
The algorithm stops when no more pure literal is left. We say that the PLA succeeds if no clause 
remains in the formula after it stops. Clearly, the formula is satisfiable if the PLA succeeds. The 
converse is not true, for example, (y V z) A (y, z) is satisfiable whereas no pure literal exists. 

Broder, Frieze, and Upfal [TO] analyzed the PLA for the random 3-SAT problem to show that, 
if A < 1.225 then the PLA applied to F(n, ; 3) succeeds with high probability (whp), and if 
A > 1.275 then it fails whp. Mitzenmacher [28] used the differential equation method introduced 
by Wormald [30j to claim that the threshold for the PLA exists and it is the solution of certain 
equations, which are somewhat complicated. That is, there is X(k), k > 3, so that the PLA applied 
to F(n, —£=1 ; k) succeeds whp if A < X(k), and fails whp if A > X(k). It, however, remains unclear 
whether it should be regarded as a rigorous proof. 

A more advanced algorithm called the unit clause algorithm (UCA) and its variations are 
analyzed [111 [21 \T[ 15] to eventually obtain the lower bound of 3.26. The UCA first chooses a literal 
uniformly at random and set it true. Then the negation of the literal is removed from the clauses 
containing it so that they become a clause of length one less. If there are clauses of length 1, or unit 
clauses, then the UCA chooses a clause uniformly at random among all unit clauses and set the 
literal in the chosen clause true. The negation of the literal is removed from the clauses containing 
it. Thus, it is possible that a 0-clause, i.e., a clause without any literal, can be created. The UCA 
succeeds if no 0-clause is created. 

In a recent paper [24J, the author introduced the Poisson cloning model Fpc(n,p; k) for random 
A;-SAT formulae, which is essentially equivalent to the classical model F(n,p; k) when p = @(n 1 ~ k ). 
That is, 

Theorem 1.1 Let k > 2 and p = @(n 1 ~ k ). Then there are constants c 1 and c 2 such that, for any 
collection T of k-SAT formulae, 

Cl Pr[F pc (n,p;k) £ JF] <Pr[F(n,p;k) e ^ < c 2 (Pi[F PC (n,p;k) s ?\* +e~ n ), 

where 

Ci = fc i/2 c S QXtKa") + o( i), Ca = e *4^(S)(?) (_!_) ( (jfc _ i K ) 1/k + (i). 
and o(l) goes to as n goes infinity. 
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The cut-off line algorithm (COLA) for the new model is also introduced in a general framework. 
Using the COLA, one may generate an instance of the Poisson cloning model and simultaneously 
carry an algorithm such as the PLA. A version of the COLA applied to Fpc(n,p; k) is analyzed to 
obtain the following result for F(n,p;k): Let 

X(k) := min 



p>o (1 - e~P) k - 1 ' 

and Fc(n,p) be the residual formula left after the PRA applied F(n,p) stops. The residual formula 
is called the core of F(n,p). The set of underlying variables of Fc(n,p) is denoted by C(n,p; k). 
In other words, a variable is in C(n,p; k) if and only if a clause of Fc{n,p; k) contains it. 

Theorem 1.2 Let X(n,p; k) = p( 2 ,™~, ) , k > 3 and a S> n -1 / 2 . Supercritical Phase: If X(n,p; k) < 
X(k) — a is uniformly bounded from below by and i (k) is the minimum i such that 2 k (l) > 2i/k, 
then 

Pr[C(n, p ; k) ^ ] < 2e-^^ + 0{n-^- 2 ' k ^ ^ ) . 
Supercritical Phase: If A := X(n,p;k) = \(k) + a is uniformly bounded from above, then, for the 
largest solution 9 X of the equation O 1 ^ — 1 + e~ ex = and all a in the range 1C«< an 1 / 2 , 

2 

Pr[ | \C(n,p;k)\ - 0pn| > a(n/a) 1/2 } = e~ n{a2) . 

In particular, the PRA succeeds with high probability if X(n,p;k) = X(k) — a with a S> n 1 / 2 , and it 
does not succeed with high probability if X(n,p; k) = X(k) + a with a ^> n 1 / 2 . 



Most of structural properties of the core can be found in [24J too. The Poisson cloning model and 
the cut-off line algorithm will be presented in detail in the next section . 

For k = 2, the PLA may not succeed with nontrivial probability even for A p := p(2n— 1) < 1. For 
example, there could be a pair of clauses (j/Vz) and {y Vz) for two variables y and z with non-trivial 
probability. Hence, we may expect, at best, that if X p := p{2n — 1) < 1 then Fc(n,p) := Fc(n,p;2) 
consists of variables of type (1,1) only. Here and in general, a variable x is of type (i,j) in a 
formula if x appears in i clauses and x appears in j clauses of the formula. The type of a literal 
x is determined by the type of x. Taking similar approaches used to analyze the structure of the 
core of the random digraph [25] . we will actually prove it and, in case that A p > 1, we prove that 
Fc(n,p) has many variables of type larger (1, 1) and the formula is not satisfiable whp. All the 
proofs presented here do not depend on [25] though. 

Other interesting properties for Fc(n,p) are studied too. Denoted by C ntP (i,j) is the set of all 
variables of type (i, j) in Fc(n,p) and C njP = Uuj\>nx)C ntP (i, j) is the set of underlying variables 
of Fc(n,p). Due to the following lemma, the structure of the core Fc(n,p) can be well understood 
provided tight upper and lower bounds for |C njP (i, j)|'s are found, (i,j) > (1, 1). 

Theorem 1.3 Suppose two formulae have the same number of clauses on the same number of 
underlying variables, and all underlying variables are of type at least (1, 1). Then the two formulae 
are equally likely to be the core of F(n,p). 

The proof of the theorem is not difficult and presented in Section [H 

For variables x of type (1, 1) in Fc(n.p), the conjunction (xVj/)A(xVz) of two clauses containing 
x and x may be replaced by (j/Vz). The replacement is called a resolution of x. It is clear that the 
satisfiability is not affected by a series of such resolutions. The formula obtained after all possible 
resolutions of type (1,1) variables is called the kernel of F(n,p) and denoted by Fk(ti,p). It is 
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worth to notice that clauses in Fx(n,p) may not consist of strictly distinct literals. Clearly, all 
variables of Fx{n,p) are of type larger than (1, 1), counting a loop {x V x) twice in the degree of x. 

Let D n!P (i,j) = sup(j,j,)> (i j) C ntP (i',f). Then Aj, p (l, 1) = C n>p and K n>p := D ntP (2,l) U 
Dn tP (l,2) is the set of underlying variables of Fx(n,p). When C njP (i',j') are all small for (i',j') > 
it sometimes more useful and/or easier to bound the size of D niP (i,j) rather than individual 
C n ,p(i',j')- We also denote M njP (i,j) to be the sum of degrees of all variables in D ntP (i,j) and their 
negations, where the degree d(y) of a literal y is the number of clauses containing it. Clearly, 

Mn, P {i,j)= £ (i' + f)\C n , p (i/,j')\. 

(i',f)>(hj) 

Notice that the numbers of clauses in Fc(n,p) and Fx(n,p) are |M niP (l, 1) and ^(M niP (l,2) + 
M ntP (2, 1) — M n . p (2,2)), respectively. Finally, we set 



Peifj.) = Pr[Poi(/i) 



e"^, and Qj(ji) = Pr[Poi(/i) > £] = 



In statements in theorems, lemmas and corollaries of this paper, we use the following convention. 

Convention: When we say that a statement is true for all a in the range a <C a <C b, it actually 
means that there is (small) constant e > so that the statement is true for a in the range 
a/e < a < eb. 

Theorem 1.4 Suppose p(2n — 1) = 1 + a is uniformly bounded from above with a 3> n -1 / 3 . Let 
A = 1 + a, A > and K a < {0 x n) 1 / 2 . Then, for fixed and Pi = Pi{O x X) andQi = Qi{6 x \), 



Pr 



C n>p (iJ)\ - PPjn > a^- 3 ^ 3 ™) 1 / 2 



and, assuming i > j, 



Pl- 
ane? 



|-D„, p (z,i) U L> n , p (j, »)| - (2QiQj - QiQ, 



n 



> a^- 3 (^ n )i/2 



Pr 

Moreover. 



M n , p (i,j) - e.X^Qi-xQj + QiQj-^ > a6^- z {eln r / 2 



Pr[p njP (i,i)| >£} <0 



(£!)V2 



(i+i)*/2 



+ e 



-f2(a 2 ) 



A stronger theorem (Theorem 13.31 see also Main Lemma in Section [3]) is to be first proved 
and Theorem 11.41 will follow as a corollary. Bounds for the sizes of the core and the kernel may 
be obtained from Theorem II. 4[ Estimations for \Fc(n,p)\, \FK(n,p)\ are possible too, where, in 
general, \F\ is the number of clauses in the formula F. 

Corollary 1.5 For the core Fc(n,p) of F(n,p) and the set C n ^ p of underlying variables of the core, 



Pr 



i a 



n,p\ 



n 



> «(^) 1/2 " 



< e -^) 
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and 



Pr 



\F c (n, P )\-e 2 x Xn >a(e x n)^ 2 ] < e~ n{a2) . 



For the kernel Fx(n,p) of F(n,p) and the set K UtP of underlying variables of the kernel, 



Pr 



\K, 



n.p | 



> a(6° x n 



3 r,W 2 



< e 



Pr 



|Fjc(n,p)| - 6 2 X(l - Ae~ 2e A A ) n > a(^n) 1 / 2 ! < e"^. 



In brief, we may also have 

Corollary 1.6 Let A = 1 + a with n -1 / 3 <C o~ < 1. Then, with high probability, the pure literal 
algorithm applied to F(n, A- ) stops leaving Q(a 2 n) type (1,1) variables, 0(o~ 3 n) type (2,1) or 
(1,2) variables, and 0(o~ 4 n) clauses containing other type variables. Moreover, once C n ^ p (i,j), 
(hj) > (l^l); are given, the residual formula is the uniform random formula conditioned on 
C n;P (i,j). 

The analysis of the structure of the core yields almost optimal bounds for the probability of 
satisfiability, improving bounds of Bollobas et. al. [8]. 

Theorem 1.7 If X p = 1 — a is uniformly bounded from below by with n -1 / 3 < a < 1, then, with 
probability 1 — ^ , all the variables in C(n,p) are of type (1, 1). That is, 



Pr 



K(n,p) = 



1 



15 + o(l) 
I60- 3 



1.1) 



In particular, Vx[K(n,p) =0] = 1 — 0((a 3 n) 1 ) for all a in the range n x / 3 <C a < 1. PFe afeo 

l + o(l) 



Pr[i ? (n,p) is satisfiable ] = 1 



I60- 3 



n 



Theorem 1.8 If X p = 1 + a is uniformly bounded from above, then F(n,p) is unsatisfiable with 
probability 1 — g-©! " 3 ™^ j. e . ; 



Pr[i ? (n,p) is satisfiable 



-0(cr 3 n) 



In the next section, we present the Poisson cloning model and the cut-off line algorithm together 
with an useful large deviation inequality called generalized Chernoff bound. Then, Theorem 11.41 
and Corollaries 11.51 and 11.61 will be proven in Section Section [J] is for the proofs of Theorems 11.31 
[TT1 and Ol 



2 Poisson Cloning Model and Cut-Off Line Algorithm 

Poisson Cloning Model: The Poisson cloning model is partially motivated by the fact that the 
degree d(y) of a literal y in F{n,p) is the binomial distribution Bin(2n — l,p), which is close to 
Poi(p(2n — 1)) when p = 0(n _1 ). Here the degree d(y) of y is the number of clauses in F{n,p) 
containing y and 

Pr[Bin(2n-l,p) = I) ={ . )p\l - pf 71 ' 1 ^, Pr[Poi(A) = l\ = e~ A ^-. 
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Though the degrees (i(y)'s are not exactly independent, they are expected to behave like i.i.d random 
variables. Thus, it has been desirable to introduce a new model for the random fc-SAT formulae 
in which the degrees are i.i.d Poisson random variables. Inspired by the configuration model for 
random regular graphs, see e.g. [5], [6], [7], and [29], the author have introduced the Poisson cloning 
model with the desired properties and show that the new model is not much different from the 
classical model in the sense of Theorem 11.11 

To analyze various properties of random graphs and random SAT formulae such as cores and 
giant components, the cut-off line algorithm is introduced too. In this section, we present the 
Poisson cloning model and the cut-off line algorithm, and related lemmas as well as a large deviation 
inequality called generalized Chernoff bound. 

For a new random 2-SAT model F PC (n,p;2), we take i.i.d Poisson A := p{2n — 1) random 
variables d y for each y in the set Y of all literals, and then take d y copies of each y. The copies 
of a literal y are called clones of y, or simply y- clones. Since the sum of Poisson random variables 
is also Poisson, the total number N\ := Y^yeY dy °f c l° nes is a Poisson 2An random variable. It 
is sometimes convenient to take a reverse, but equivalent, construction. We first take a Poisson 
2An random variables N\ and then take N\ unlabelled clones. Each clone is independently labelled 
as y-clone uniformly at random, in the sense that y is chosen uniformly at random from Y. It is 
well-known that the numbers d y of y-clones are i.i.d Poisson A random variables. 

If N\ is even, the formula F PC (n,p; 2) is to be defined by generating a (uniform) random perfect 
matching on those N\ clones and contracting clones of a literal y into y. That is, an edge consisting 
of a y-clone and a z-clone in the perfect matching yields the clause (y V z) in F PC (n,p ;2) with 
multiplicity. If y = z, it produces, a loop (y V y), which contributes 2 in the degree of y. It turns 
out that there are many ways to generate the random perfect matchings and we may choose one 
that makes given problems easier to analyze. Some specific ways will be discussed when the cut-off 
line algorithm is introduced. 

If N\ is odd, we arbitrarily choose a clone, say y-clone. This clone induces a 1-clause, called 
a defected clause, consisting of y. The defected clause contribute only 1 to the degree of the 
corresponding literal. The same procedure taken for the case of even N\ are to be carried for the rest 
of clones. Strictly speaking F PC (n,p; 2) varies depending on how to construct the defected clause. 
However, for any collection T of 2-SAT formulae, the probability that F PC (n,p; 2) is in T does not 
depend on how the defected clause is chosen (for odd N\), since F PC (n,p;2) T whenever there 
is a non-standard clause in F PC (n,p;2). Thus it is normally unnecessary to describe F PC (n,p;2) 
for odd N\. For k > 3, the Poisson cloning model F PC (n,p;k) for random /c-SAT problems may 
be similarly defined. 

Theorem 1 1 . 1 1 has been proved using somewhat straightforward computations for Pr[F(n,p ; k) = 
F] and Pr[F PC (n,p ; k) = F]. 

Cut-Off Line Algorithm (COLA): To generate a uniform random perfect matching on N\ 
clones, we may keep matching two unmatched clones uniformly at random. Another way is to 
choose the first clone as we like and match it to a clone chosen uniformly at random among all 
other unmatched clones. Clearly, there are many ways to choose the first clone. This is a big 
advantage since we may select a way that makes the given problem easier to analyze. In general, a 
sequence of choice functions will tell how to choose the first clone at each step. A choice function 
may be deterministic or random. If N\ is even, this would yield a uniform perfect matching 
regardless what the choice functions are. If only one clone, say of y, remains unmatched, we just 
add the defected clause consisting of y. 

It is useful to introduce a more specific way to choose the second clone uniformly at random. The 
way presented here will be useful to analyze some algorithms like the PRA. First, we independently 
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assign, to each clone, a uniform random real number between and A. For the sake of convenience, 
we say that a clone is the largest, smallest, etc. if so is its assigned number. Each choice function 
is to choose an unmatched clone without changing the (joint) distribution of the numbers assigned 
to all other unmatched clones. A choice function satisfying this condition is called oblivious. For 
instance, a choice function is oblivious if it chooses a clone of a pure literal. If a choice function 
chooses a largest v-clone, it is not oblivious, as it changes the distribution of the numbers assigned 
to other unmatched v-clones. 

Once an unmatched clone is chosen by an oblivious choice function, the largest clone among all 
other unmatched clones are to be matched to the chosen clone. This may be further implemented 
using the Poisson A-cell: First, map a ?/j-clone with assigned number r to the point (r, j) in the 
two dimensional plane. One may think that there are 2n horizontal line segments in R 2 from (0, j) 
to j = l,...,2n and, on each line segment, there are i.i.d. uniform d y , points that tell the 

assigned numbers for d y , clones of y.. This rectangular configuration is called a Poisson X-cell. 
Each line segment of the Poisson A-cell with the points is an independent Poisson arrival process 
with density 1, up to time A. 

The cut-off line algorithm (COLA) can be described as follows. Initially, the cut-off line is the 
vertical line in R 2 containing the point (A,0). At the first step, once the oblivious choice function 
chooses a clone, we move the cut-off line to the left until a clone is on the line. The clone is 
clearly the largest unmatched clone, excluding the chosen clone. The new cut-off value, denoted 
by Ai, is the assigned number to the clone. The new cut-off line is, of course, the vertical line 
containing (Ai, 0). Keep repeating this procedure, one may obtain the i th cut-off value Aj and the 
corresponding cut-off line. It is crucial to note that, provided all choice functions are oblivious, 
once Aj is given then all numbers assigned to unmatched clones are i.i.d uniform random numbers 
between to Aj. 

For 9 in the range < 9 < 1, let A(0) be the cut-off value when (1 - 9 2 )Xn or more clones are 
matched for the first time. Conversely, let N(9) be the number of matched clones until the cut-off 
line reaches 6\. Two versions of the cut-off line lemma have been proven in [24J. 



Lemma 2.1 (Cut-off Line Lemma) Let A > be fixed. Then, for 9 1 < 1 uniformly bounded below 
from and < A < n, 



Pr 



max |A(0) - 9\\ > £j < 2e r Q(min{A ' <^ }) , 



9:9, <9<1 



and 



Pr 



max \N(9) - 2(1 - 6> 2 )An| > A 

■8 1 <8<l 



< 2e 



_ n(min{Ai _^_ }) 



For the Poisson A-cell conditioned on N\ = N, a similar lemma may be obtained. 

Lemma 2.2 (Cut-off Line Lemma for N clones) Let k > 2, A > be fixed. Then, for the Poisson 
X-cell conditioned on N\ = N, and for 9 1 < 1 uniformly bounded below from and < A < N, 



Pr 



max |A(0) - 9X\ > 4 



and 



Pr 



max \N(0) - (1 - 9 2 )N\ > A 



< 2e ~" (min{A ' (i-e 1 )Jv } 

A 2 

-n(min{A, (1 „ w } 



< 2e 
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For the proof of the cut-off line lemma, a large deviation inequality, called generalized Chernoff 
bound, has been used. Here, we present a version of it that is useful for our analysis. A proof can 
be found in \24 



Lemma 2.3 (Generalized Chernoff bound) Let X\, X m be a sequence of random variables. Sup- 
pose 

E[Xi\Xi, < fa, 



and there are a i , b i and £ so that 



E[(X i -Hi) 2 \X 1 ,...,X i ^ l ] < ai 



and 



E[(Xi - Hife^ Xi ~^\Xx, ...,AVi] < bi for all < £ < £ . 
Iffi£o YlT=i b i - Ya=i a i f or some < 5 < 1, then 



(2.1) 

(2.2) 
(2.3) 



m m 
i=l i=l 

for all A > 0. Furthermore, if X\, ...,X m are independent and satisfy 112. ty) for fii = E[Xi] and 

E[(Xi - E[Xi]) 3 e^ Xi ~ E[Xi]) ] < k for all £ in the range \£\ < £ , 
then 5£ bi < Y1T a t f or < 5 < 1 implies that 

< e -i min ^oA, r ^}) 



(2.4) 



Pr [|£^-E*« 

i=i i=i 



> A 



for all A > 0. 



We conclude this section by presenting a corollary that can be applied to random walks with 
negative drift. 

Corollary 2.4 Suppose \2.1\) - l2lfy hold with fit = —h for a constant (3 > 0. If d£ < J2 a i 
for some < S < 1, then 



m 
i=l 



< e -n(min{*e (A+fcm), (A+hm) 2 / i <*}) _ 



3 Pure literal algorithm for the random 2-SAT problem 

As mentioned in the previous section, the COLA is useful to realize some algorithms like the PLA. 
The following specific COLA is used to analyze the structure of the core of Fpc{n,p). 

COLA (for core): Construct a Poisson A-cell. If a variable is of types (0, i) or (i, 0), put all clones 
of it and its negation into a stack in an arbitrary order. This does not mean that the clones are 
removed from the A-cell. 

(a) If the stack is empty, go to (b). If the stack is nonempty, choose the first clone in the stack 
and move the cut-off line to the left until the largest unmatched clone, excluding the chosen clone, 
is found. (The stack naturally defines choice functions.) Then, match the largest unmatched clone 
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to the chosen clone. Remove all matched clones from the stack and from the cell. If there are new 
variables of type (0, i) or (i, 0), then put all clones of them and their negations in the stack. Repeat 

(a) . 

(b) Choose a clone uniformly at random from all unmatched clones and put it in the stack. Then, 
go to (a). 

The steps carried by the instruction described in (b) are called free steps as it is free to choose any 
clone. We will call unmatched clones of pure literal light and the other unmatched clones heavy. A 
literal is called heavy if it not pure. 

According to the cut-off line lemma, one may expect that there are 20 2 Xn unmatched clones 
(when the cut-off line is) at OX. The number of heavy clones at 0\ is expected to be close to 
2(1 — e~ ex )0Xn = Q(0 2 n). (See (13.3j) below.) Thus, the number of light clones seems to be close to 

20 2 Xn - 2(1 - e- ex )n = 26Xn(0 - 1 + e" eA ), 

which is 0(# 3 n) provided ^> |A — 1|. If is small, however, this observation would give us no 
information. This is due to the fact that the standard deviation for the number of heavy clones 
is On 1 ^ 2 so that, for 3 n <C On 1 ^ 2 , or <C n -1 / 4 , it is unclear wether the number of light clones is 
positive or not. 

A more careful analysis starts from the observation that, when is small, most of heavy variables 
are of type (1, 1) and that the two clauses containing such a variable and its negation may be resolved 
to one clause. In other words, the two clause (x V y) and (x V z) may be replaced by (y\J z), which 
is called a resolution. After a series of such resolutions, all variables of type (1, 1) may disappear. 

To take an advantage of this fact, we will introduce many phases. Let 1Q A < (3 < — j 2 ^- The 
first phase starts at the beginning of the whole process. For j > 1, the j th phase ends and the 
(j + l) th phase begins when the cut-off line reaches (1 — (3)i X. At the beginning of each phase, 
all variables of type (1, 1) and their unmatched clones are called passive. All other unmatched 
clones are called active. These terms do not change until the beginning of the next phase. So, 
variables that become type (1,1) only after the current phase starts remain active until the end of 
the phase. Once a clone becomes pure, it plays the same role regardless of being passive or active. 
The procedure (b) of COLA also need to be replaced by 

(b)* Choose a clone uniformly at random from all unmatched active clones and put it in the stack. 
If there is no active clone, stop. Otherwise, go to (a). 

As a stack is used, if one of the two unmatched clones of a passive variable and its negation 
were matched in a step then the choice function in the next step must choose the other clone. 
Thus, the situation is exactly the same except the number of passive variables decreases by 1. 
This means that the COLA applied without passive clones is essentially the same as the original 
algorithm. In this sense, we may say that two active clones are matched if so are they after the 
resolutions of matched passive clones. Here the resolution has the natural meaning: Two edges 
{zi, Z2}, {Z3, Z4} with clones £2, z$ of a passive variable and its negation is reduced to the one edge 
{zi, 2:4} . Conversely, an active clone may be regarded as unmatched if it is not matched or it is not 
matched after the resolutions. 

Let Ac be the cut-off value when no light clone remains for the first time in the COLA applied 
to the Poisson A-cell. The main lemma shows that Ac is highly concentrated near X X, as expected, 
with standard deviation (f^n) -1 / 2 . Once Ac is determined, the unmatched clones form the Poisson 
Ac-cell without pure literals. 
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Lemma 3.1 (Main Lemma) Let A = 1+a with a S> n x / 3 . Then, for all a with 1<«< (O^n) 1 / 2 , 

Pr[|A c - 9 X X\ > a(^n)- 1 / 2 ] = 

For the proof, we first estimate the number of active clones at the beginning of each phase. Let 
Nj be the number of active clones at the beginning of the j' th phase and let Mj be the number of 
matched active clones during the entire j th phase. Then, the cut-off line lemma for Nj clones, or 
Theorem 12.21 gives 

A 2 

Pr[ \Mj - (1 - (1 - (3) 2 )Nj\ > A\Nj] < 2e ^ mi ^ A 'Njy\ (3.!) 

Notice that the number Nj + i of active clones at the beginning of the next phase is Nj — Mj — 2Bj, 
where Bj is the number of variables of type (1, 1) at (1 — (3) 3 X that were of type larger than (1, 1) 
at (1 — (3) 3 A. (Recall that an active clone is regarded as unmatched if it is not matched or it 
is not matched after resolutions.) For a literal y and < 9 < 9' < 1, denoted by d y (9,9') is the 
number of y-clones larger than or equal to 9 A and smaller than 9'X, and d y {9) = d y (0,9). Then, 
for Bj = (1 -l3y-\ 

Bj = ^ Wty+i) = d x (9 j+1 ) = 1)1(4(^+1,0;) + dv(9 j+1 ,9j) > 1). 

Observe that (d x (9j + i), d x (9j + i), d x (9j + i, 9j),d x (9j + i, 9j)), x £ X, are i.i.d 4-tuples of independent 
Poisson random variables with means 9j + \X, 9j + ±X, (9j—9j + i)X, (9j — 9j + i)X, respectively. Applying 
the generalized Chernoff bound, we have 

-^(miniA,^-}) 

Pr[ \Bj - {9 j+1 X) 2 e- 2d ^ x {\ - e~ 2(i ^ x )n\ > A] < 2e . (3.2) 

Therefore, N j+X is expected to be close to (1 - /3) 2 Nj - 2(9 j+1 X) 2 e~ 2 ^+ lX (l - e _2/ ^ A )n. Applying 

■fj is close to 26? 2 - 



this inductively, we expect that Nj is close to 29 2 X(1 — Xe 26 ^ A )n,. 



Let 

Hj = ^(d x (9j) + d x (9j))l((d x (9j),d x (9j)) > (1,1) 



= J2(d x (9j) + d x (9j))l((d x (9j),d x (9j)) > (1,1)) - 2 l(d x (9j) = d x (9 3 ) = 1 
xex ' x&x 

Then Hj is the number of active heavy clones at the beginning of the j th phase unless there is a free 
step before BjX. Generally, Hj is an upper bound for the number of heavy clones and Lj := Nj — Hj 
is a lower bound for the number of light clones. The bounds may be strict only when there is a 
free step before the cut-off line reaches 9jX. 

As (d x (9j),d x (9j)) are i.i.d pairs of independent Poisson random variables with mean 9jX, the 
generalized Chernoff bound gives 



Pr 



1 -^(min{A,^-}) 

Hj-29jX(l-e~ e i x -9jXe~ 2e i x )n >A < 2e T* . (3.3) 



Suppose A = 1 + a with a » ra -1 / 3 and 1 < a < (fl 3 ™) 1 / 2 . We take < (3 < so that 
(1 — (3) a ~ l = 9 X + a(9 x n)~ 1 ^ 2 for an integer o. Let 



Aj = O.Ola(^n) 1 / 2 ^(1 - 0) 



3 

2 j — i — a 

4 



1=1 
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Then, since (1-/3)40. 3 ^ 2 = (1— /3) 3 / 2 (l — /3) 5j / 4 increase as j increases, and a = 6> A +a(0 A n) x / 2 = 
(1 + o(l))0 A , we have 

q(1 - / 9)^(0|n)- 1 / 2 < a(0 3 n)- 1 / 2 < 1. (3.4) 

and 



Aj = O.Ola(0|n) 1 / 2 - /3)^P" = O.Ola(0]n) 1 / 2 (l - /3)V ^(1 - 0) V « 0| 



8=1 



1=1 



for all j = 1, a. 

Lemma 3.2 For all t = 1, ...,a ; we Ziaue 



Pr 



3 j = 1, s.t. \Nj - 20 2 A(1 - \e- 2d > x )n\ > Aj 



and 



Pr 



3 j = 1, £ si. |Lj - 26 j \(6 j - 1 + e~^ A )n| > 2A 



< e -f1(a 2 (l-/3)^-) 



Proof. Let = 20 2 A(1 - Ae- 2 ^ A )n, = (0 j+1 A) 2 e - 2 ^+ lA (l - e" 2 ^ A )n and a } = (1-0) 
Then ay < (tfjn) 1 / 2 by ([33]) and 



4 a. 



n J+ i = (1 — P) 2 rij — 2b j. 



Since 



Pr 



and 



Pr 



3 j = 1, ...,£ + 1 s.t. \Nj -rijl > Aj 



■. p r 
+ Pr 



3j = l,...,£ s.t. \Nj-nj\yAj 
\Nj -nj\< Aj Vj < £, \N e+1 - n | > A e+1 



\Nj -nj\< Aj Vj < I, \N e+ i - n e+1 \ > A m < Pr \N e - n t \ < A e , \N e+1 - n e+1 \ > A m 



it is enough by Y^jtX e~ Q{a ^ = e ~ n ^+^ to show that 



Pt+i ■= Pr 



\N t -n t \< Ae, \N e+1 - n \ > A m 



< e -^(a!+i)_ 



Notice that N £+1 = N e - M e - 2B e , n e+1 = (1 - (i) 2 n l - 2b t and 

\N e+1 - n e+1 \ < \N t -M e -(I- (3) 2 N e \ + (1 - (3) 2 \N e - n e \ + |(1 - (3) 2 n t - n e+1 - 2B e \ 
= \M t - (1 - (1 - Pf)N e \ + (1 - P) 2 \N e -n e \ + 2\B e - b t \. 

As A e+1 = (1 - (3) 2 A e + 0.01a /+1 (0f +1 n) 1/2 , and a,- « (^n) 1 / 2 give 



P£+i < Pr 



+ Pr 



Be-bt\ > 45o« w (^ 3 +i^) 1/2 
M t - (1 - (1 - /3) 2 )A^| > 25oa, +1 (0| +1 n) 1/2 , - nj < A, 



< e -n(a? +1 ) + Pr n M/ _ (1 _ (1 _ z?) 2 )^! > 5 i o a, +1 (0| +1 n) 1 / 2 |JV> - n t \ < A e 
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The desired bound follows, since (|3.ip yields 



Pr 



\M t - (1 - (1 - 0) 2 )N t \ > ^a i+i {6l +l n) l l 2 N t ] < e~ n ^ 



for given N# with \Nt — n t \ < -C 0|n. 

The second inequality holds for (|3.3p gives 



Pr 



3 j = 1, I s.t. \Hj - 26j\(\ - e~ e ' x - djXe~ 2e ^)n\ > Aj 



< 2 £ e - n («?) = e -«(«?). 

3=1 

□ 



Proof of Main Lemma. We first estimate the probability that all light clones disappear during 
phase j. Observe that the number of light clones is bounded by renewal random walk processes 
with negative drift: If the chosen light clone is matched to another light clone, then the number 
decreases by 2. If it is matched to a clone of a variable with type larger than or equal to (2,2), 
the number decreases by 1. If it is matched to a clone of a variable with type (1, b) or (b, 1), b > 2, 
then the number decreases, in expectation, by —1 + Thus, there is absolute constant h > 
such that the expected number of light clones is less than —h. 

If all light clones disappear during phase j, j = 1, a — 1, then, either Lj+i < 0.la(0? +1 n) 1 / 2 
or the renewal random walks with negative drift must reach beyond O.la(0| +1 n) 1 / 2 . Lemma 13.21 

2 e T, a 

gives the probability of the former is e~^ a t 1- ^) -2- ). For the latter, observe that the total number 

of walks is less than Nj, which is O(0? n) with probability e~^ a ( 1_ ^) _2_ ). We consider excursions 
that are segments of the renewal random walks between two consecutive visits to 0. The generalized 
Chernoff bound, or CoroHarv l2,41 with a,, 6j, £, 5 = 0(1) yields that the probability of each excursion 



reaching beyond O.la(0? , x n) 1//2 is at most 



E-C(min{a(6lf +1 n) 1 /2 + /j mi ( Q (g3 n) 1 / 2 +hm) 2 / m} ^ -n(a(6»f +1 n) 1 / 2 )-n(m) _ -^(fl^n) 1 / 2 ) 

m>l m>l 

As there are at most Nj excursions such an excursion exists with probability at most 

2 l-n 

Therefore, there exists no light clones in a step of the j th phase is at most e~^ a ( 1_ ' 3 )^ _ ). 
which yields 

Pr[A c > x X + a(0 x n)-^ 2 ] < e~ n{ ° 2 \ 



replacing a by a/ X. 

On the other hand, after (a — l) th phase, there are at most O(a(0^n) 1 / 2 ) light clones and fi(#^n) 
unmatched clones of variables of type larger than (1, 1), with probability 1 — e~ n ( a2 \ As the number 
of light clones has negative drift, no light clone exists after O(a(0^n) 1 ^ 2 ) = 0( ^ 3 "y/ 2 O^n) more 

clones are matched with probability 1— e - ^" 2 ) . Thus, the cut-off value when no light clone exists for 
the first time cannot be smaller than (1 — 0( ^ 3 "^ 1/2 ))0 X X = X X — O(a(0 x n)^ 1 ^ 2 ) with probability 

1 — e - ^" 2 ) by the cut-off line lemma, or (|3.ip . Replacing a by ca for appropriate constant, we 
conclude that 

Pr[A c < X X - a(0 x n)- 1/2 } < e~ n{a2 \ 
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□ 



Let Fjj(n,p) be the core of Fpc(n,p). One may define C* p , C* p (i,j), D* p (i,j) and M* p (i,j) 
for Fpc(n,p) as C n<p , C njP (i,j), D n<p (i,j) and M njP (i,j) are defined for F(n,p). We first estimate 
and M*(i,j). Notice that the upper and lower bounds for |D* (i',j')|'s yield bounds 

for |Cn,p(^j)l as 

|C* P (*, j)| = \D* n>p (i,j)\ - \D* n>p {i + 1, j)\ - \D* n>p (i,j + 1)| + \D* UtP (i + + 1)|. 

Let Da(i,j) be the sets of variables that are of type larger than or equal to (i, j) at [i^ ■= 
d x X + a(9 x n)~ 1 ^ 2 , respectively, and let M^(i,j) be the number of clones of variables in D^(i,j) 
less than /x„, respectively. Then, Lemma IBTTl gives 



and 



Pr 



Pr 



D~(i,j) C j) C D+(i,j) for all i, j 

M-(t, j) < M* ip (i,j) < M+(i,j) for all i,j 



1 - e 



-0(a 2 ) 



l_ e -^(« 2 ). 



Since |-D^(i,j)| and M^(i,j) are the sums of i.i.d random variables and it is easy to check all the 
conditions of the generalized Chernoff bound with aj, 6j = @(#* + - ? ), £ = 5 = 1, we have 



Pr 



\Da(i,j)\-Qi04)Qj(j4h 



> A 



< 2e 



_n(min{A,-A_}) 



respectively, and 



Pr 



M£(iJ) - »Z[Qi-i(p*)Qi(tJL*) + Qi(l4)Q3-i(Pa))n 





> A 







< 2e 



^(mir^A, * }) 

' J n 



respectively. Therefore, 



Pr 



Pr 



l^,p(i,i)|-Q<(Mi)Qi(Mi)n>A 
" Qi(t*Z)Qjfa)n < -a 



< 2e 



_n( mi n{A,-A_}) 



+ e 



< 2e 



_Q (min{A , * }) 



+ e 



-n(a 2 ) 



and 



Pr 



Pr 



KfihJ) ~ (Qi-M)QM) + Qi(j4)Qj-l(l4))l4n > A 



< 2e 



-n(min{A,-A_}) 



+ e 



-fl(a 2 ) 



< 2e 



M n,p(hj)- [Qi-l{Va)Qj(Va) + Qi(t J 'a)Qj-l(} J 'a))Van < ~ A 

We also have 

Pr[\D* > i] < e-^ 2 ) +Pr[|£>+ > £] 



+ e 



+ 



n 



(QM)QM)Y 



< e 



-n(a 2 ) , (Qi(»a)Qj(l J -Z) n ) 



(3.5) 
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If is fixed, it is easy to see that 

Qi(j4)Qj(j*5) = Qi(0 x X)Qj(e x X) + oca^-^^n)- 1 / 2 ), 

and similarly 

»tQi(j£)Qj(j4) = o.XQiie^Q^e^ + oiae^ie.n)- 1 ' 2 ). 

Replacing a by ca for an appropriate constant c > and taking A = ca9 1 ^~ 1 {8 x n)^ 1 / 2 
ca ^+i-3^3 n )i/2 ) we hav6) for q. = Q^0 X X), 



n 



Pr 



and 



Pr 



" x x(Qi-iQj + QiQj-i)n\ > a^'- 3 (^n)V3 



< e -n(oj+ J '- 3 a 2 ) + e -^(« 2 ). 



Thus, for F(n,p), Theorem 11.11 gives 

Theorem 3.3 Suppose p(2n — 1) = 1 + a is uniformly bounded from above with a 3> n -1 / 3 . Let 
A = 1 + a, A > ; 1 < a < (9 x n) 1 / 2 , and ^ = 9 X \± a{6 x n)~ l l 2 , respectively. Then, 



Pr 



\D n , p (i,j)\ - Qi(^a)Qj(Va) n > A 



-n(min{A,^^|) , „ 

<2e X +J » + e -«(° 2 ) 



Pr 



|Ai,p(«,j)| - Qi(^a)Qj(^a) n < A 



<2e '^% e -n(a 2 ) 



and 



Pr[|Z) niP (i,j)| ><] <0 



(£!)V2 



+ e 



-Q(a 2 ) 



We a/so aawe 



Pr 



AMi.j) -/*+ [Qi-i(jiZ)Qj^a) + Qi{f*t)Qj-i(pt) J" > A 



< 2e 



-H(niin{A,-^-}) 

e A J n _|_ g-"( a ) 



M njP (i,j) -H a [Qi-i{li a )Qj{n a )+Qi{lJ> a )Qj-i{li a ))n < A 



Pr 



In particular, for fixed (i,j), 



< 2e 



_Q( min{A , * }) 



+ e" 



-O(o 2 ) 



Pr 



Pnj,(i,i)| " QiQjn > a^- 3 (^ n )i/2 



n(e*+i-3 a 2) + e -n(a 2 ) 



and 



Pr 



M„ iP (z, j) - e^iQi-xQj + QiQj-i)n > atf+'-Wn) 1 / 2 



□ 



Theorem 13.31 together with 

|C„, p (i, j)| = |£>nj,(», j)| " |Ai,p(i + 1, j)\ - \D n ,p(hj + 1)1 + |A»,p(t + 1, j + 1)|. 
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implies that 



Pr 



\C n , P (i,j) \ ~ PiPjn 



Similarly, if i > j, then 

\Dn,p(hj) U-D n ,p(i,«)| = |Ai, P (i,j)| + |-Dn, P (i,i)| - |D n ^(*,*)l 

gives 



D n , p (i,j) U D nj ,(j,i)| - (2Q i Q j n - QiQi)n > a^-Wn) 1 / 2 



Pr 



The last two bounds of Theorem 11.41 are already in Theorem 13.31 



< e -n(e«+i-3a 2 ) + e -fi(a 2 )_ 



Furthermore, as C njP = £^(1,1), K njP = L> n >p (2, 1) UJD niP (l,2), 2\F c (n,p)\ = M niP (l,l), 



2|F K (n,p)| = M n , p (l,2) + M n , p (2,l) - M n , p (2,2) and Q X (6 X ) = 1 - e~°^ = 6 X , Corollary O 
follows from Theorem 11.41 

Finally, the first two bounds in Corollary 11.61 follow from Theorem II .41 since 9 X = 0(a). For the 
last bound, if <r 4 n < e for a small positive constant e, then Theorem 11.41 implies that all variables 
in the core are of types (1,1), (1,2) or (2,1), with probability 1 - 0(e 2 ). If cr 4 n > (5 for a large 
constant > 0, then Theorem 11.41 with a = f3^' 1 (6 i n) 1 / 2 also gives 



Pr[M n , p (2,2) - 2e x XQ l (e x \)Q 2 (e x \)n > p-°- l 9 A n] < e 



-nos- 



-0.2g4 



n) < -n(/3°- 8 ) 



Similar bounds hold for M n>p (l,3) and M n>p (3, 1). As 6 X = 0(a) and 6 x \Qi(9 X \)Q 2 (6 x \) = 0(8*), 
the desired bound follows. If e < <r 4 n < /3, we simply use the bound for A' = 1 + (/3/n) 1 / 4 , or 
p' = 1+ W 1/4 : Since Pr[M„ iP > /i] < Pr[M n y > /i] and M„y = 0(/3) = 0((/3/e)a 4 n) with 
probability 1 — e~ n ^°' 8 \ the bound follows. 



4 Scaling Window: Proofs of Theorems II. 3L 11.71 and 11.81 

Suppose all C njP (i,j)'s for (i,j) > (1,1) are given. The first thing we need to establish is that 
all 2-SAT formulae with the same C n ^p(i, j)'s are equally likely to be the core of F(n,p). More 
generally, it is not hard to show the following lemma. 

Theorem 4.1 (Restated) Suppose two formulae have the same number of clauses on the same 
number of underlying variables, and all underlying variables are of type at least (1, 1). Then the 
two are equally likely to be the core of F(n,p). 

Proof. Let F\ and F 2 be the two formulae. After an appropriate permutation, we may assume that 
the two formulae have the same set of underlying variables. Then, a formula having F± as its core 
can be mapped to the formula obtained by replacing clauses in F\ with clauses of F 2 . It is easy to 
see that the core of the formula obtained this way is F 2 . It is also clear that the map is one-to-one 
and onto. Furthermore, two formulae mapped each other have the same number of clauses, which 
means that the random formula F(n,p) is equally likely to be one of the two formulae. □ 

We now consider the configuration model for given C np (i,j): Similar to the Poisson cloning 
model, take i clones of x and j clones of x for each variable x 6 C np (i,j). The uniform ran- 
dom perfect matching on all clones is called the random configuration. The random configuration 
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then yields a 2-SAT multiformula after contractions. The event that the multiformula has nei- 
ther loops nor multiple clauses is called SIMPLE or SIM. Conditioned on SIM, the random 
2-SAT formula has the uniform distribution among all 2-SAT formulae with the same C ntP (i,j)'s. 
This is not difficult to see as the number of perfect matchings that yield a fixed 2-SAT formula is 
j)(*'iO n ' p ^'' 7 • It i s known that the probability of SIM is uniformly bounded below from 0, 
especially, for any event A in the uniform model, or equivalently in the configuration model, 

Pr[A] = Pt*[A\SIM] < Pr^SIM}- 1 Pr*[A] = 0(Pr*[A]), (4.1) 

where the probability Pr* is taken over the random configuration without any condition. Hence, 
as far as the constant factor is not concerned, it is enough to bound the desired probabilities in 
the configuration without any condition. To clarify terminology, we recall that the configuration 
model is obtained from the random configuration by conditioning SIM. For an event A depending 
on the random configuration only, such as the event that the i th clone of y and the j th clone of z 
are matched, Pr[A| may not be well-defined, but Pr*[A] or Pr* [yl| SXM] may be still considered. 

We may be able to estimate the probability of SIM in the case that all but few clones are clones 
of type (1, 1) literals: Suppose all N but o(N 1 / 2 ) clones are clones of type (1, 1) literals. First, with 
probability 1 — o(l), no pair of clones that are not clones of type (1, 1) literals is matched. Thus, 
the multiformula is not simple mainly because two clones of type (1, 1) variable x and its negation 
are matched. Let A x be such an event. Then, for the set U of all type (1, 1) variables, 

eW e M n M <pr*[ ijx] <E(-i/ e pr *[n 

\W\=i \w\=t 

for all i > 0. For £ = o{N- 1 / 2 ) and \W\ = t, 

\U\\ (N/2-o{NV*)\ (l + 0(^i#^))(f)^ 



and 



I J \ I ) l\ 

(N-2£-l)\\ _ i + o(&; 



Pr*[n xeW A x ] 



(JV-l)H N e 



Therefore, 



Pr* [ (J A x 1 = (1 + o(l))e~ 1/2 , and Pr*[SIM] = (1 + o(l))e" 1/2 (4.2) 
xew 

We are now ready to prove Theorems 11.71 and 11.81 
Proof of Theorem 11.71 We may generate F(n,p) with A p := p(2n — 1) = 1 — a by first taking 
F(n,q) with X q = 1 + n -1 / 3 log(cr 3 n) and then independently selecting each clause of F(n,q) with 
probability p/q = ^—1=<L__ =l- a + o(a). 

Applying Corollary 11.51 for X q and a = log(<r 3 n) and using 6 q := 6\ q = 2n~ 1 / 3 log(<7 3 n) + 
0(ji~ 2 / 3 log 2 (<7 3 n)), we have 

\C n , q \ = 2 q q 2 n + O(0 3 q n) + 0((6 q n) 1 / 2 log(a 3 n)) = (4 + o(l))n 1 / 3 log 2 (a 3 n), 
\K nA \ = 9 3 q X 3 n + 0(8tn) + O^n) 1 / 2 log(a 3 n)) = (8 + o(l)) log 3 (a 3 n) 
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and 

\F c (n, q)\ = 9 2 q X 2 n + 0(9 3 q n) + 0((6 q n)^ 2 log(<7 3 n)) = (4 + o(l))n 1 / 3 log 2 (a 3 n), 
\F K (n,q)\ = |^ 3 A 3 n + O(^ 4 n) + O((0 3 n) 1 / 2 log(a 3 n)) = (12 + o(l)) log 3 (o- 3 n), 
and for K U;q (l, 2) := C n , ff (l, 2) U C n>9 (2, 1) 

|K n ,,(l,2)| = (8 + o(l)) log 3 (cr 3 n) 

with probability 1 — e ~ n ( lo s 2 (°' 3n ')). Theorem 13.31 with the same a also gives 

\D n>q (i,j)]=Q if i + i>7 (4.3) 

with probability 1 - n~ 7 / 6+ °^\ 

Suppose C n>q (i,j)'s are given with C„ i9 := U(ij)>(i,i)C n , q (i, j), K n>q := U(i,j)>(i,i)C n>q (i,j), 
D n , q (hj) ■= u (i',j')>(i,j) c n,q(i',j'), and K n , q (l,2) := C ni9 (l,2) UC„ i? (2,l) satisfying the above 
conditions, and the number Mc and Mjc of clones of variables in C nj9 and K n>p , respectively, 
satisfy 

M C = E (i+i)|Cn, 9 (i,j)| = (8 + (l))n 1 / 3 log 2 (o- 3 n), 

(i,i)>(l,l) 

and 

Mk= (' + J')|Cn,,(i,j) I = (24 + 0(1)) log 3 (a 3 n). 

(M)>(1,1) 

In the corresponding random configuration for given C nj q(i, j)'s, two clones y, z of variables in -fT niP 
may yield a clause after resolutions of variables in C ni9 (l, 1). This occurs if and only if there are 
x 1 ,...,x e e C ntq (l, 1) such that {w := y,wi}, {wi,w 2 }, {%-i, u^}, := z} are edges in 

the random configuration, where Wi,v)i are the two clones of ar, and 5, (not necessarily respectively), 
including the case £ = 0. If this event occurs, we say that the length £(y, z) of y, z is £ + 1 and the 
edges {u;i,u>j + i} (resp. the corresponding clauses after contractions) are called intermediate edges 
(resp. clauses) of the pair. The length £(y, z) is infinity if no such x^'s exist. It is easy to see that 
Pr*[£(y,z) = l] = I7 i rT . Similarly, 

Pr*[*(lM)=2] = (l MK ~ 1 ^ 1 



M c - 1 / M c - 3 ' 



and, in general, 



Mc-l/V M c -3/ V M c -2^ + 3/M c -2£ + l 
For i = 1, ...,4 and £i,...,£i < ^ log(cr 3 n) <C n 1 / 3 , the same argument also gives 



1 + 0(1) 



Sn 1 / 3 log (a 3 n 



(4.4) 



Notice that a pair y, z of clones of variables in K n ^ q yields the corresponding clause in the kernel 
Fk(ti,p) only if £(y, z) < oo and all the £(y, z) intermediate clauses of the pair are in F(n,p). Such 
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an event occurs with probability 



■ log(<r 3 n) 



^(p/q) e ^*[e(y,z) = £\siM] = o( £ (i-a) £ Pr*[^,z) = ^ + (i-a)v l0 ^ 3 «)) 



=1 

log(cr 3 ra) 



= 0( X) (l-^) £ Pr*Wy^)=^])+0((a 3 n)- 10 ). 



Similarly, if the kernel Fk(ti,p) of F(n,p) has z or more clauses, i = 1, 4, then there must be 
i distinct pairs {yj,Zj} of clones of variables in K n , q such that £(yj,Zj) < oo and all the £(yj,Zj) 
intermediate clauses of each pair are in F(n,p), j = 1, .., i. The probability of such event is at most 

°(( M *) £ (p/q) h+ --- +ei Pr[£(y j ,z j )=£ j ,j = l,...,i\SIM}) 
V ' €i,...,4>l 

for fixed i distinct pairs {yi, Zi} of clones of variables in K njP , i = 1, ..,4. Since = 0(log 3 (cr 3 n)) 
and 

^ := E (plq) ll+ - +li V**nVj,Zj)=ej, j = l,...,i] 

f log(a 3 n) 

< X (l-^ 1+ - +<i Pr*[^(l/ j ,^)=^, J = l,...,i]+(l-a)^ 10 ^ 3 ™) 
1 + (1) 



< 



W/3 log 2 (c7 3 



n 



y + (- 3 n)- 10 , 



that 



Pr[|F*-(n,p)| > 4] < (^n)" 4 /^ 1 ). 

Thus, it is enough to estimate the probability of \Fx{n,p)\ = 2, 3 since kernels must have two or 
more clauses. For two variables w,x in K n ^ q , let A wx be the event of K n:P = {w,x}, \Fx{n,p)\ = 3 
and no variables in K n>q are in intermediate clauses, and let B wx be the event that, in addition to 
A wx , each of the three clauses in Fk(ti,p) has at least 1 but not more than ^ log(cr 3 n) intermediate 
clauses. Clearly, for K n>q (l, 2) := C„,,(l, 2) U C n , q (2, 1), 



PT[\K n , p \ =2,\F K (n,p)\ = 3] >Pr [ [j B v 



(4.5) 



WJ^X 



For an upper bound, if \K njP \ = 2, \Fk(ti,p)\ = 3 but Uw,xeK n , q A wx does not occur, then at least 4 

distinct pairs of clones of variables in K n ^ q must have finite length and all corresponding intermediate 
clauses of them must be in F(n,p). This probability is at most 0((^}P±) = (cr 3 n)~ 4 / 3+ °( 1 ) . The 
probability of U{ w , x }£K ntq (i,2)A wx may be bounded by 



Pr 



A wx 

' {w,x}£K„ iq (l,2) 



0(\K n , q \(\K n>q \ - \K Uiq (l,2)\)P 3 ) = o(log 6 (a 3 n)P 3 ) = o^n)' 1 ), 



for \K n , q \ - \K n , q [\,2)\ = o(log 3 (a 3 n)). Finally, 



X Pr^XB^] = 0(|K ni(? | 2 A) +0(| J Pr n ,,| 2 (l- < 7)vi°g(^)) = 0((<7 2 n) -i ) _ 



mGK ni q(l,2) 
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All together, we have 



Pr 



Moreover, as 



\K n>p \ = 2, \F K {n,p)\ = 3] = Pr (J B wx + o{(a 3 n 



0< Yl Pr[^]-Pr[ |J B wx 



wxGK n ,q(l,2) 
w^x 



VJ^-X 



w^x 



< Vr[B wx n B w > x >] 

w,x,w f ,x' eK" ni q(l,2) 



w^x ,vj' ^x f ,{w,x}^{w / ,x f } 



and 



E 



Pr[B wx n £ w v] = 0(|K„ i<? | 4 P 4 ) = o{{a 6 n 



we deduce that 



Pr 



\K n J = 2,\F K (n,p)\ = 3\= Yl Pr[B wx ]+o((a 3 n)- 1 ). 



wx£K n ^ q {l,2) 
w^x 



To estimate Pi[B wx ], we may assume that both of w and x are of type (2, 1), after exchanging 
the roles w, x with w, x if needed. In the random configuration, there are 5-3-1 ways to match the 
6 clones of w and x. In each case i = 1, 15, let Bi(£i, £2, £3) be the event that there are £1,^2,^3 
intermediate variables in K n ^ q for the three matches. Then 

l 5 M iog( CT 3 n ) 

Pr[B wx ) =J2 E C 1 " a " o(^))' 1+ ' 2+ ' 3 Pr*[A(^i,^,4)|5/Af]. 

i=l <i,4j^ 3 >2 

For ^1,^2^3 in the above range, (14. 2p and (14. 3p give 

n>Wue*wm = Pr ' |J3 - ( ^r SfM1 

/ l + o(l) \3 Pt*[SIM'] 
V8nV3 log 2 (cr 3 n)/ Pr*[S/M] 

V8nV3 bg 2 (o- 3 n)/ ' 

where SIM' is the event that the random configuration on the Mq — 6 — 2{£\ + £2 + ^3) clones is 
simple. Hence 

15 + o(l) 



Pr[B v 



8 3 o 3 nlog 6 (o- 3 



■n 



and 



Pr 



\K n J=2,\F K {n,p)\=3 



\K niq (l, 2)|\ 15 + o(l) + o(((7 3 n) -l ) = 15 + o(l) 
2 / 8 3 cr 3 nlog 6 (cr 3 n) 16<r 3 n 



If l-f^n.pl = 1) then there are at least two clauses in Fx{n,p)- Appealing directly to F(n,p) with 



n = 1 ~ a 

r 2n-l' 



Pr[|K njP | = l]=0(n£ 

^2>1 



n 



2^- 1 (£ 1 -l)! 



2-1 



(4-1)! 



1 — (J N^l+fe 



2n- 1 



01 



cj 2 n/ 
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where l\ and £2 represent the numbers of intermediate clauses. 

When the event B wx occurs for variables w, x of type (2, 1), the only way (out of the 15 ways) 
it directly makes the formula unsatisfiable is the case that the two clones of each of w and x are 
matched and the two clones of w and x are matched. Therefore, the same argument yields 

Pr[F(n,p) is UNSAT] = 1 + °^ . 

16cr d n 

□ 

Proof of Theorem 11.81 At A5, Fx{n,p) has £l(9^n) variables 1 — e ~ n ^ 9 ^ by Theorem 13.31 and 
the convention (see Section 1). Exchanging the roles of x and x, if necessary, we may assume that 
the number of x-clones is at least as large as the number x-clones for all variables x € K n ^ p . For 
the lower bound, let Y be the set of clones of x's and Z be the set of clones of x's. Then 

\Y\ > \Z\. 

We now consider the event that all clones in Z are matched to clones in Y, in which case, (1, 1) 
is a satisfying assignment. The probability of the event is 

l y l l y l ~ 1 . . . I y l ~ \ z \ ± 1 > o-w > e -°K^ = e -°^ n ) 

\Y\ + \Z\ - 1 \Y\ + \Z\ - 3 |y|-|Z| + l~ 

For the upper bound, we may assume a < 0.01. Since the probability decreases as a increase, 
once the probability is at most e~^ a n ) for a = 0.01, the probability is at most e - ^™) for larger 
cr's. Corollary 11.51 implies that, with probability 1 — e~ n ^A , 

\K n>p \ > 0.99^n and M n>p {2, 2) + M n>p (l, 3) + M n)P (3, 1) < O.Ol^n. (4.6) 

It is enough to show the desired bound in the random configuration satisfying (|4.6p as Pr* [SIM] 
= 0(1). We first take the following procedure to make the problem simpler. Remove all the 
clauses in the kernel Fx(n,p) containing any variable not in K njP (l,2) and its negation. (Recall 
K niP (l, 2) = C n , p (l, 2) U C n)P (2, 1).) Then f|4.6j) implies that there are at most O.Ol^n such clauses. 
Furthermore, as at most one variable in K ntP (l, 2) is affected by one such clause, there are at most 
0.02#^n pure clones can be created. We now apply PLA: Each time a pure clone is matched, the 
number of pure clones decreases by 2 if it is matched to another pure clone. If it is matched to a 
non-pure clone, two clones become pure with probability 1/3, and a variable becomes of type (1, 1) 
with the other probability. This is so since all non-pure variables are of type (1, 2) or (2, 1). Hence, 
after each step, the number of pure clones decreases by at least 1/3 in expectation, and increases 
by no more than 1 at any case. The generalized Chernoff bound implies that no pure clone is left 
within 0.076^ra steps, with probability 1 — e~ ^x n \ Therefore, there are at least 0.9#^n variables 

of type (1,2) or (2, 1) remain after PLA stops, with probability 1 — e~^* n \ The desired bound 
may be obtained from the next lemma. 

Lemma 4.2 Let F(b) be the (multi) formula yielded by the random configuration on b variables of 
type (1,2) or (2,1). Then 

Pr[F(6) is SAT] = o( e - a026 ), 

as b — > 00. 
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Proof. After exchanging the roles of x and x as needed, we may assume that all b variables are 
of type (2, 1). Notice that an assignment for the b variables may be regarded as a 0, 1 vector of 
length b. That is, the i th coordinate of it tells the truth value of the i th variable, say x,. Suppose 
an assignment has exactly tb 0's. Then, there are 2tb+ (1 — t)b clones whose truth values are set to 
be 0. These clones are called negative. The other clones are set to be 1 and will be called positive. 
The assignment is a satisfying assignment if and only if there is no edge connecting two negative 
clones. We call such an edge bad. A clause corresponding a bad edge is also called bad. 

If F := F(b) is satisfiable, then there are assignments that yield no bad clause. Among those 
assignments, we may take one with maximum number of l's. Such an assignment is called maximal. 
Suppose an satisfying assignment s = (sj) is maximal. Then, for a variable x i with Sj = 0, the only 
clone of x i , which is a positive clone (with respect to s), must be connected to a negative clone. 
Otherwise, the assignment s* obtained from s by changing the value of to 1 is another satisfying 
assignment, which implies that s is not maximal. 

Summarizing, we have the followings. Provided s has tb 0's, the number N of negative clones 
is 2tb + (1 — t)b = (1 + t)b and the number M of positive clones is 2(1 — t)b + tb = (2 — t)b. If s is a 
maximal satisfying assignment, then there is no bad clause and the positive clone of each x« with 
s i = must be matched to a negative clone. The number L of positive clones of Xj with Sj = 
is tb. Since all N negative clones must be matched to positive clones, and the L positive clones 
mentioned above must be matched to negative clones, and the number of perfect matchings on m 
clones for even m is 

( m _ iv = — 

[ W 2 (m/2)!' 

we have that 

P(s) := Pr[s is a maximal satisfying assignment] 
(%li)N\(M-N-l)\\ 
(M + JV-1)!! 

where 

N = (l + t)b, M = (2-t)b, L = tb, 
(provided s has tb 0's). Using Stirling formula, we have 

P(s) < bexp (2(1 -t)bH( + N In N + M ~ — ln(M - N) - M * — ln(M + iV)) 

= 6exp (2(1 - t)bH(^) + (1 + t)b\n ^ + In 1^) . 

Finally, by Q b ) < e bH ® and 

q max 2 [H(t) + 2(1 - t)H(^) + (1 + t) In 1±* + ln < _ . 02 , 

we have 

Pr[F is SAT]<Y^ b )p(s)=o(e-^ b ). 

□ 
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